Automatic learning optimizer

ABSTRACT

A method of gathering performance information about a workload, and automatically identifying a set of high-load database query language statements from the workload based on the performance information, is disclosed.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.60/500,490, filed Sep. 6, 2003, which is incorporated herein byreference in its entirety. This application is related to co-pendingapplications “SQL TUNING SETS,” Ser. No. ______ Attorney Docket No.OI7036272001; “AUTO-TUNING SQL STATEMENTS,” Ser. No. ______ AttorneyDocket No. OI7037042001; “SQL PROFILE,” Ser. No. ______ Attorney DocketNo. OI7037052001; “GLOBAL HINTS,” Ser. No. ______ Attorney Docket No.OI7037062001; “SQL TUNING BASE,” Ser. No. ______ Attorney Docket No.OI7037072001; “AUTOMATIC PREVENTION OF RUN-AWAY QUERY EXECUTION,” Ser.No. ______ Attorney Docket No. OI7037092001; “METHOD FOR INDEX TUNING OFA SQL STATEMENT, AND INDEX MERGING FOR A MULTI-STATEMENT SQL WORKLOAD,USING A COST-BASED RELATIONAL QUERY OPTIMIZER,” Ser. No. ______ AttorneyDocket No. OI7037102001; “SQL STRUCTURE ANALYZER,” Ser. No. ______Attorney Docket No. OI7037112001; “HIGH-LOAD SQL DRIVEN STATISTICSCOLLECTION,” Ser. No. ______ Attorney Docket No. OI7037122001;“AUTOMATIC SQL TUNING ADVISOR,” Ser. No. ______ Attorney Docket No.OI7037132001, all of which are filed Sep. 7, 2004 and are incorporatedherein by reference in their entirety.

FIELD OF THE INVENTION

This invention is related to the field of electronic database managementsystems.

BACKGROUND

SQL tuning is a critical aspect of database performance tuning. It is aninherently complex activity requiring a high level of expertise inseveral domains: query optimization, to improve the execution planselected by the query optimizer; access design, to identify missingaccess structures; and SQL design, to restructure and simplify the textof a badly written SQL statement. Furthermore, SQL tuning is a timeconsuming task due to the large volume and evolving nature of the SQLworkload and its underlying data.

Over the past decade two clear trends have occurred: (a) the databasesystems have been deployed in new areas, such as electronic commerce,bringing a new set of database requirements, and, (b) the databaseapplications have become increasingly complex with support for verylarge numbers of concurrent users. As a result, the performance ofdatabase systems has become highly visible and thus critical to thesuccess of the businesses running these applications. For example,database systems continue to be deployed in new areas, such aselectronic commerce, and the database applications have increasinglybecome sophisticated to support more users and provide morefunctionalities, making the query optimization task more complex.

The database system vendors deal with this increased complexity byenhancing the optimizer capabilities to deal with new SQL constructs,add better searching techniques, or a richer cost model. While thisconventional approach can solve some problems, it is not capable to dealwith the dynamic nature of the database application, e.g., dynamicchanges in the application workload. Indeed the conventional optimizerwill always face situations where mistakes are unavoidable. For example,the optimizer can lack of information about the objects accessed by aSQL statement. The optimizer logic may also not be prepared to deal withcertain kinds of problems.

SUMMARY

An automatic learning optimizer is able to automatically tune a databasequery language statement by automatically identifying high load or topdatabase query language statements that are responsible for a largeshare of the application workload and system resources based on the pastdatabase query language statement execution history available in thesystem, automatically generating ways to improve execution plansproduced by a compiler for these statements, and automaticallyperforming corrective actions to generate better execution plans forpoorly performing statements.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an example of the Automatic Learning Optimizerarchitecture.

FIG. 2 shows an example of the process flow of the auto-learningprocess.

FIG. 3 represents another exemplary illustration of the automaticlearning optimizer device.

FIG. 4 shows another example of a system to perform the auto-learningprocess for database system management.

FIG. 5 is a block diagram of a computer system suitable for implementingan embodiment of coverage computation for verification.

DETAILED DESCRIPTION

Overview

The embodiments of the invention are described using the term “SQL”,however, the invention is not limited to just this exact database querylanguage, and indeed may be used in conjunction with other databasequery languages and constructs.

An automatic learning optimizer has learning capabilities that make itable to learn from past execution history of SQL statements which can berepeated in the future. For example, a database application is oftenrepetitive, i.e., the same SQL statements are submitted over and over,some more frequently than others. Information about these statements canbe collected from various sources, e.g., execution statistics for someor all operations of the query execution plan (number of output rows,amount of memory, number of disk reads), or the caching ratio of thedatabase objects.

An auto-learning capability for the auto-tuning optimizer provides acomponent of a fully self-managed database system. The auto-learningoptimizer can execute a background task which identifies potentialoptimizer mistakes made for a target SQL statement, and automaticallyproduce optimizer corrective actions. Hence, the auto-learning optimizercan gradually repair the suboptimal executions plans run by the databasesystem.

The auto-learning process starts by identifying a small subset of SQLstatements which are potential candidates for auto-learning. Forexample, this subset may correspond to all SQL statements which areknown to have a suboptimal execution plan and a high impact on theoverall performance of the system. Alternatively, the auto-learningdetection mechanism may focus on the subset of high-load SQL statements.Once the set of high-load SQL statements has been identified, theauto-learning optimizer can uncover its mistakes by analyzing each SQLstatement in the set. Based on this analysis, if optimizer relatedproblems are then found, corrective actions are produced and stored in acomputer readable medium, such as a disk. Because the corrective actionsare permanently stored, the auto-learning optimizer can perform aniterative learning process which accumulates, over time, more and moreknowledge on problematic queries, and also prevents the correctedproblems from recurring.

In one embodiment, the auto-learning optimizer performs the auto-tuningfunction to generate the corrective actions and store them in SQLProfiles. Hence, corrective actions are automatically placed in one ormore profiles, which are stored in the tuning base. The auto-learningprocess may be on-line, with the auto-learning process running almostcontinuously as a background task. In this mode, high-load SQLstatements stored in the cursor cache are targeted. Hence, the on-linemode can address critical SQL tuning issues while having a very lowoverhead on the system performance.

In another embodiment, the auto-learning optimizer performs learningoff-line. In this mode, the auto-learning process is executed during themaintenance window as an automated manageability task. This off-linemode can have more time and system resources to perform the auto-tuningfunctions. Hence, the auto-learning optimizer can have time to refreshcorrective actions produced in the past, and auto tune less critical SQLstatements.

FIG. 1 shows an example of the auto-learning optimizer feedback loopprocess for the on-line auto-learning optimizer. The on-lineauto-learning background process is performed by the auto-learningoptimizer 110, which identifies the set of high-load SQL statements fromthe cursor cache 120. This can be achieved by keeping, for each cursorin the cache, a record of the total elapsed time accumulated since thelast auto-learning cycle. When an auto-learning cycle starts, cursors inthe cache can be ranked based on this cumulative elapsed time metric.The auto tune process is then applied to each cursor in their rankingorder. For example, in one embodiment, cursors which account for morethan 1% of the total cumulated elapsed time are considered by theauto-learning optimizer 110.

The auto-learning optimizer 110 can also auto tune recurring cursorswith a long average elapsed time, even when their cumulated elapsed timeis not necessarily high, because these cursors could negatively affectthe response time of individual end users even if their overall impacton the system performance is limited. A ranking procedure can be usedfor these cursors by ranking them based on the average elapsed time(instead of the cumulative elapsed time).

The auto-tuning process may skip cursors which have not been executedmore than once since they have been first loaded in the cursor cache, inorder to prevent the auto-learning process from spending time on nonrecurring cursors (e.g. ad-hoc queries). Also, cursors with hintsets oroutlines already defined on them may be skipped, since these cursorshave been tuned already.

Once the set of cursors to auto tune is identified, the auto-learningoptimizer starts the learning process. This process can run continuouslyin the background, up to the beginning of the next auto-learning cycle,which reduces the overall impact on the system. This background task maybe accomplished by pacing each auto tune execution based on the totalload of the system. Profiles produced for each auto-tuned statement maybe stored in the Tuning base, even if no corrective actions have beenproduced. In that case, a VOID profile may be generated to keep track ofstatements which have been auto tuned.

Once a SQL statement is auto tuned, the learned corrective actions (i.e.a profile) are permanently stored in the Tuning base 130. The nextexecution of that statement will automatically trigger a hard parse,except when the associated hintset is ‘VOID’. When the statement isrecompiled, the query optimizer 140 takes into account the extraknowledge carried by the profile to generate an improved execution plan.

The execution plan produced by optimizer 130 is sent to the cursor cache120, which closes the auto-learning feedback loop. The statements thathave been improved by the auto-learning optimizer then have executionplans that are no longer high-load. They drop out of the set ofhigh-load SQL identified by the auto-learning optimizer 110, becausethey are no longer high-load, or because a tuning profile now exists forthem. As the automatic learning process continues, the formerly lesserhigh-load statements will grow in rank to the point that they now becometargets for auto-learning optimization process. With a stable SQLworkload, the system can rapidly converge with all high-load statementsauto-tuned.

An example of a method of performing the on-line auto-learning processis shown in FIG. 2. High load SQL statements are identified, 210. Thehigh load statements are ranked based on a performance metric, such asexecution time, 220. Each high load statement is tuned by the automatictuning optimizer based on its rank, 230. A profile of tuning actions iscreated for the high load statement, 240, which is stored in a tuningbase, 250. A query optimizer generates an execution plan for thestatement using the profile from the tuning base, 260.

The off-line auto-learning process can be performed by the auto-learningoptimizer system as shown in FIG. 3. Workload information isautomatically captured at regular interval, by default once every 30minutes, from the cursor cache 310 by snapshot collector 320, and placedin the workload repository 330. The off-line functions of theauto-learning process can be executed as an automated manageabilitytask, scheduled to be performed within the maintenance window. Forexample, a database administrator (DBA) can define that automaticmanageability actions (e.g. index rebuild, space coalesce, auto analyze,auto learn) are to be executed at night from 10 pm to 6 am.

The snapshot collection of the cursor cache can be performed bycollector 310 at regular intervals, such as every 30 minutes, tosnapshot performance statistics, and save them in the workloadrepository 330, to capture information on high-load SQL. For example,the information on SQL that can be saved in the repository can includedata to auto-tune the statement at a later time.

The high load SQL statements can be tuned within the maintenance windowby the auto-learning optimizer 390. The high load extractor 340identifies high load statements and retrieves the information on thehigh-load SQL statements captured in the workload repository 330 sincethe last off-line auto-learning session. A SQL Tuning Set (STS) 350 iscreated to store the set of high-load SQL statements. These statementscan be ranked using their cumulative elapsed time, and/or high averageelapse time. Each statement in the STS is auto-tuned in ranked order bythe auto-tuning optimizer 360. If a statement has been recentlyauto-tuned, it can be skipped. Once a SQL statement is auto tuned, thelearned corrective actions (i.e. a profile) are permanently stored inthe Tuning base 370, and retrieved by the query optimizer 380 togenerate a well tuned execution plan for the statement.

The auto-learning optimizer can optimize and execute an ad-hoc SQLstatement the first time it comes into the system. The auto-learningoptimizer can do this by learning about important properties of SQLstatements such that the information learned from one statement can beused for another statement. For example, by analyzing the set ofhigh-load SQL statements, the auto-learning optimizer can detect dataskew between two columns and use this information to generateappropriate statistics such as either a multi-column histogram or amulti-dimensional histogram. Because the information learned from thead-hoc SQL can become excessively large, making it impractical tocollect the corresponding statistics, the past SQL execution history isused to identify and weed out many of the SQL properties that offereither one-time or insignificant performance gains.

FIG. 4 shows an example of the auto-learning optimizer system with afeedback loop for automatically learning tuning information on an ad hocSQL statement basis. The cursor cache 410 is examined by the snapshotcollector 420 to gather information that is stored in the workloadrepository 430. Using SQL information collected in the workloadrepository, the extractor 450 identifies high load statements and placesthem in a STS 460. The auto-learning optimizer 440 determines theappropriate set of complex statistics (like multi-column and multi-tablehistograms, predicate statistics, . . . ) to collect and refresh. Therefreshed statistics 470 are added to the complex data statistics 480,which are then used by the query optimizer 490 to generate an executionplan for a SQL statement. The ad-hoc auto-learning process is thereforeable to increase the number of statements in the SQL workload whoseexecution plans would benefit from these extra statistics while limitingthe cost (time to collect and time to refresh) for these extrastatistics.

According to one embodiment of the invention, computer system 500performs specific operations by processor 504 executing one or moresequences of one or more instructions contained in system memory 506.Such instructions may be read into system memory 506 from anothercomputer readable medium, such as static storage device 508 or diskdrive 510. In alternative embodiments, hard-wired circuitry may be usedin place of or in combination with software instructions to implementthe invention.

The term “computer readable medium” as used herein refers to any mediumthat participates in providing instructions to processor 504 forexecution. Such a medium may take many forms, including but not limitedto, non-volatile media, volatile media, and transmission media.Non-volatile media includes, for example, optical or magnetic disks,such as disk drive 510. Volatile media includes dynamic memory, such assystem memory 506. Transmission media includes coaxial cables, copperwire, and fiber optics, including wires that comprise bus 502.Transmission media can also take the form of acoustic or light waves,such as those generated during radio wave and infrared datacommunications.

Common forms of computer readable media includes, for example, floppydisk, flexible disk, hard disk, magnetic tape, any other magneticmedium, CD-ROM, any other optical medium, punch cards, paper tape, anyother physical medium with patterns of holes, RAM, PROM, EPROM,FLASH-EPROM, any other memory chip or cartridge, carrier wave, or anyother medium from which a computer can read.

In an embodiment of the invention, execution of the sequences ofinstructions to practice the invention is performed by a single computersystem 500. According to other embodiments of the invention, two or morecomputer systems 500 coupled by communication link 520 (e.g., LAN, PTSN,or wireless network) may perform the sequence of instructions topractice the invention in coordination with one another. Computer system500 may transmit and receive messages, data, and instructions, includingprogram, i.e., application code, through communication link 520 andcommunication interface 512. Received program code may be executed byprocessor 504 as it is received, and/or stored in disk drive 510, orother non-volatile storage for later execution.

In the foregoing specification, the invention has been described withreference to specific embodiments thereof. It will, however, be evidentthat various modifications and changes may be made thereto withoutdeparting from the broader spirit and scope of the invention. Thespecification and drawings are, accordingly, to be regarded in anillustrative rather than restrictive sense.

1. A method comprising: gathering performance information about aworkload; and automatically identifying a set of high-load databasequery language statements from the workload based on the performanceinformation.
 2. The method of claim 1, further comprising: automaticallygenerating tuning actions for each high-load statement.
 3. The method ofclaim 2, further comprising: automatically storing the tuning actionsfor each high-load statement in a profile.
 4. The method of claim 3,further comprising: persistently storing each profile in a tuning base.5. The method of claim 4, further comprising: receiving one of thehigh-load statements at a compiler; retrieving the profile for thereceived statement from the tuning base; and generating an executionplan for the statement with the profile.
 6. The method of claim 1,wherein the database query language statement is a SQL statement.
 7. Amethod comprising: gathering performance information about a workload;and automatically identifying a set of high-load database query languagestatements from the workload based on the performance information. 8.The method of claim 7, further comprising: automatically generatingtuning actions for each high-load SQL statement.
 9. The method of claim8, further comprising: automatically storing the tuning actions for eachhigh-load statement in a profile.
 10. The method of claim 9, furthercomprising: persistently storing each profile in a tuning base.
 11. Themethod of claim 10, further comprising: receiving one of the high-loadstatements at a compiler; retrieving the profile for the receivedstatement from the tuning base; and generating an execution plan for thestatement with the profile.
 12. The method of claim 7, wherein thedatabase query language statement is a SQL statement.
 13. A methodcomprising: gathering performance information about a workload; andautomatically identifying a set of high-load database query languagestatements from the workload based on the performance information. 14.The method of claim 13, further comprising: automatically generatingtuning actions for each high-load statement.
 15. The method of claim 14,further comprising: automatically storing the tuning actions for eachhigh-load statement in a profile.
 16. The method of claim 15, furthercomprising: persistently storing each profile in a tuning base.
 17. Themethod of claim 16, further comprising: receiving one of the high-loadstatements at a compiler; retrieving the profile for the receivedstatement from the tuning base; and generating an execution plan for thestatement with the profile.
 18. The method of claim 13, wherein thedatabase query language statement is a SQL statement.